-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: halt-height behavior is not deterministic #16639
Conversation
Solution: - make sure in state machine that we don't execute block beyond halt-height
baseapp/abci.go
Outdated
func (app *BaseApp) FinalizeBlock(req *abci.RequestFinalizeBlock) (*abci.ResponseFinalizeBlock, error) { | ||
var events []abci.Event | ||
|
||
// don't execute blocks beyond the halt height | ||
var halt bool | ||
switch { | ||
case app.haltHeight > 0 && uint64(req.Height) > app.haltHeight: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change potentially affects state.
Call sequence:
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).FinalizeBlock (baseapp/abci.go:648)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I would like a test case to be added in order to make sure that in case of refactors of the baseapp code we don't miss behaviours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow. If I say, for example, halt-height == 1000
, then when Commit
is called indicating block completion, the process will exit after 1000. How is it non-deterministic? Why is the logic duplicated in FinalizeBlock
?
because sending signal to self process is asynchronous, the current process can run for a while before the signal is handled and stop the node, I seems to observe that when I use |
Mhhh, so |
I think it only calls os.Exit if signal sending fails |
I'm sorry, I don't follow. When the https://github.com/cosmos/cosmos-sdk/blob/v0.47.3/baseapp/abci.go#L479-L485 |
You can check the internal of halt method, it don't call os.Exit unconditionally |
I don't understand what this means. |
os.Exit() is not called if signal sending is successful, it's a fallback if signal sending failed, it's not ideal because the shutdown would not be graceful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thank you @yihuang! Would be nice though to have a test to lock-in this behavior to prevent future regressions. It is critical code yet tomorrow I can sneakishly delete it and no tests will fail.
Wow I didn't see the PID logic -- completely forgot about that. I believe I originally wrote this code, but I can't recall why not just resort to |
maybe for graceful shutdown, although right now we don't have critical stuff to cleanup, but for async commit (memiavl is doing that already), it'll be helpful to wait for it to complete before shutdown. |
no problem, I just feel the solution is not ideal yet, first of all, there are code duplication between |
Yes, a panic in |
done, I think the equivalence of panic in latest abci is just simply returning errors. |
|
||
if err := app.checkHalt(req.Height, req.Time); err != nil { | ||
return nil, err | ||
} | ||
|
||
if err := app.validateFinalizeBlockHeight(req); err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change potentially affects state.
Call sequence:
(*github.com/cosmos/cosmos-sdk/baseapp.BaseApp).FinalizeBlock (baseapp/abci.go:649)
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com> (cherry picked from commit c0e9e8c) # Conflicts: # CHANGELOG.md
Solution: - port the fix from sdk: cosmos/cosmos-sdk#16639
* Problem: halt-height is not deterministic Solution: - port the fix from sdk: cosmos/cosmos-sdk#16639 * Update CHANGELOG.md Signed-off-by: yihuang <huang@crypto.com> --------- Signed-off-by: yihuang <huang@crypto.com>
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
Port of cosmos#16639 Co-authored-by: yihuang <huang@crypto.com>
Description
Closes: #16638
Solution:
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change